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ABSTRACT 


We construct error distributions for a c omp i lation of 232 Large Magellanic 
Cloud (LMC) distance moduli values from Ide Griis et all ( 2014 ) that give an 
LMC distance modulus of (m — M) 0 = 18.49 ± 0.13 mag (median and la sym¬ 
metrized error). Central estimates found from weighted mean and median statis¬ 
tics are used to construct the error distributions. The weighted mean error dis¬ 
tribution is non-Gaussian — flatter and broader than Gaussian — with more 
(less) probability in the tails (center) than is predicted by a Gaussian distribu¬ 
tion; this could be the consequence of unaccounted-for systematic uncertainties. 
The median statistics error distribution, which does not make use of the individ¬ 
ual measurement errors, is also non-Gaussian — more peaked than Gaussian - 
with less (more) probability in the tails (center) than is predicted by a Gaussian 
distribution; this could be the consequence of publication bias and/or the non¬ 
independence of the measurements. We also cons truct the er ror distributions of 
247 SMC distance moduli values from Ide Gri]s_&_ Bono 020151 ). We find a central 
estimate of (m — M) 0 = 18.94 ± 0.14 mag (median and la symmetrized error), 
and similar probabilities for the error distributions. 


1. Introduction 


The LMC is a widely studied nearby extragalaetic setting with a plethora of stellar 
tracers. The closeness of the LMC and the abundance of tracers has resulted in a large 
number of distance measurements to this nearby galaxy. As the LMC distance provides an 
important low rung of the cosmological distance ladder, it is of great interes t to st u dy co l 
lections of LMC distance moduli measurements. Following .Schaefer ( 2008i) . Ide Griis et ah 
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( 2014 ) compiled a list of 237 LMC distance moduli published during 1990- 20140 and used 
these data to examine the effects of publication bias and correlation between the measure¬ 
ments. They conclude that the overall effect of publication bias is not strong, although there 
are significant effects due to measurement correlations, especially in some individual tracer 
(smaller) subsamples. 


In this paper we extend and complement the analysis of de Grijs et ah (20141) by con¬ 
structing and studying the error distributi ons o f the full (232 measurements) sample and 
two individual tracer subsamples of the Ide Griis et all J2014[ ) compilation. More spec ifically, 
we examine the Gaussianity of these error distributions^] We begin by following Chen et ah 
(2003) and Crandall & Ratral 020151 ) and construct an error distribution, a histogram of mea¬ 


surements as a function of N„, the number of standard deviations that a measuremen 


ates from a central estimate. This is similar to the z score analysis of de Grijs et aJjj201Jh 


devi- 


h owev er, we use a central estimate from the data compilation itself whereas Ide Grijs et al. 
(2014) use two published values that are assumed to well represent the measurements. We 
use two techniques to find the central estimate: weighted mean and median statistics. Since 
median statistics does not make use of individual measurement error bars, median statis¬ 
tics constraints are typically weaker than weighted mean ones, but are more reliable in the 
presence of unaccounted-for systematic errors. 


We find larger probability tails (error) in the weighted mean distributions. For the 
median 232 measurements case, we find that the distribution is narrower than a Gaussian 
distribution at small (intermediate) N a < 2 where the probability is higher (lower) than 
expected for a Gaussian distribution (a similar effect is seen in the smaller sub-samples 
we study). We attempt to analytically categorize these distributions by fitting to well- 
known non-Gaussian distributions: Cauchy, Student’s t, and the double exponential. Using 
a Kolmogorov-Smirnov (KS) test, the fits are poor (< 0.1%) for the Cauchy, and double 
exponential cases. A Student’s t case with a n = 39 gives a probability of 21%, and is the 
best fit. 


Given that the weighted mean error distributions are significantly non-Gaussian, it is 
proper to focus more on our median statistics central estimate results. In this case, for all 


1 Five of the de Grijs et al. (2014) entries do not have error bars, so here we only consider the 232 
measurements that do. 


Conventionally one assumes a Gaussian distrib ution of errors . For instance, this is used when determining 


constraints from C MB anisotropy data (see e.g., iGanea et al. 


199? 


Ratra et al 


19991 


Chen et al 


2004; 


Bennett et al. 2013 1 and has been tested for such data (see e.g., Park et al. I hOQlllAde et~al.ll201filf . 1 Schaefer 


(j2008f ) also assumes the LMC distance moduli measurement errors have a Gaussian distribution. 
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three data sets, the error distributions are narrower than Gaussian. This could be the result 
of mild publication bias, or more likely, as argued bv lde Griis et al.l (hoi4h . the consequence 
of correlations between measurements. 

In Section [2] we summarize our methods of graphically and numerically describing the 
error distribution of the distance moduli values. Section [3] describes our findings from analy¬ 
ses of the distribution of all 232 measurements. Sections [4] and [5] summarize our analyses of 
the two individual tracer subsamples. In Section [H] we describe the results found using SMC 
distance moduli measurements from de Griis & B on o (120151 1. We conclude in Section [7] 


2. Summary of Methods 


Of the 237 LMC distance moduli values collected by [de Griis et ah (2014), five do not 
have a quoted error. For our analyses here we use the 232 measurements with symmetric 
statistical error bars. To determine the error distribution of the 232 measurements we must 
first find a central estimate. We do this using two statistical techniques: weighted mean and 
median statistics. 


The weighted mean (iPodariu et ah 2001) is 


/ 1 ■t 


N 

E AM 2 

i=l 

—R 5 

E i/o? 

i= 1 


(i) 


where D t is the distance modulus and a, is th e one standard deviation error of i = 1, 2 N 
measurements. While Ide Griis et ah (2014) use only the quoted statistical error, in our 
analyses a* is the quadrature sum of the systematic (if quoted) and statistical errors. Since 
many do not quote a systematic errorjfl and if one is stated it is small, the difference is not 
large. The weighted mean standard deviation is 


N 


- 1/2 


@ wm 


I] VE 


( 2 ) 


v 7=1 


de Grijs et aL (2014 ) note that only 49 measurements have a quoted non-zero systematic error, and four 


additional measurements include systematic uncertainties in their error. The significance of this is considered 
in Section [7] 
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We can also determine a goodness of fit, y 2 , by 


X 


2 


1 \ ^ (Di — D wm ) 2 

N-l^ a 2 

2—1 1 


( 3 ) 


The number of standard deviations that x deviates from unity is a measure of good-fit and 
is given by 


N a = |x - 1| V / 2(AT - 1). 


( 4 ) 


The median statistics technique is beneficial because it does not make use of the indi¬ 
vidual measurement errors. However, consequently, this will result in a larger uncertainty on 
the central estimate than for the weighted mean case. To use median statistics we assume 
that all measurements are statistically independent and have no systematic error as a whole. 
A measurement then has a 50% chance of either being below or above the median value. For 
a detailed description of median statistics see lGott et all (hoOlh H 


Once a central estimate is found, we can construct an error distribution using N a defined 


as 


A% = 


Di — D, 


CE 


(% 2 + ^ ce ) 1/2 


( 5 ) 


where D CE is the central estimate of D i; either D wm or D med , and Cce is the error of the 
central estimate, either cr wm or cr me d. Here D med is the median distance mo dulus, with 50% of 


the measurements being above it and 50% below, and a m od is defined as in 

Gott et al 

(2001) 

such that the range D rnpd ± cr mprt includes 68.3% of the probability. 

de Griis et al. 

(2014) 

consider a similar variable, a “z score”. Their z score is different in that they use two reference 


values for their central estimate flFreedman et al.l (120011) and iPietrzvnski et ai¬ 


de G riis et aL ( 2014h 


J2013 D while 


assume 


we use the weighted mean and median central estimates, 
that the reference values are a good representation of the distance moduli measurements. 
Therefore they use the z score assuming Gaussianity. We do n ot as sume Gaussianity as our 
central estimates are found directly from the collected de Griis et al. (2014) data using our 
statistical techniques. 

To numerically describe the error distribution, we use a nonparametric Kolmogorov- 
Smirnov (KS) analysis. This is used to test the compatibility of a sample distribution to a 


4 For applications an d discu s sions of median s t atistic s see Chen fe Ratra ( 20031) . Mamaiek fe Hillenbrand 

( 2012h, Croft fc Dailev ( 20151) . Andreon fc Hurn ( 2012 1. 


(200811. IChen fc Ratral (l201lh. 


Calabrese et al 


Farooa et al. ( 2013 1. Crandall fc Ratra ( 2014 1. Ding et am 2015 1. Collev fc Gott ( 2015 1. and Sereno ( 2015 1. 
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reference distribution. This can be used in two ways, with binned or un-binned datajj The 
test compares the LMC distance moduli measurements to a well know distribution function. 

To set conventions, we first use the Gaussian probability distribution function is 

P(|x|) = ^Lexp(-|x| 2 /2). (6) 

v Z7T 

It will also be of interest to consider other well-known distributions. These include the 
Cauchy (or Lorentzian) distribution 


P( M) 


1 i 

7T 1 T Ixl 2 


(7) 


This distribution has extended tails, and is a popular choice for a widened distribution com¬ 
pared to the Gaussian. The Cauchy distribution has large extended tails with an expected 
68.3% and 95.4% of the values falling within |x| < 1.8 and |x| < 14 respectively. The 
Student’s t distribution is described by 


P(M) 


r[(rc + l)/2] 1 

- v /7mT(n/2) (1 + |x| 2 /n)( n+1 )/ 2 


( 8 ) 


Here n is a positive parameterjj and T is the gamma function. When n —>■ oo this becomes 
the Gaussian distribution. When n — 1 it is the Cauchy distribution. Thus, for n > 1, it is 
a function with extended tails, but less so than that of the Cauchy distribution. The last 
distribution that we consider is the double exponential. This is given by 


^(l*|) 


1 

2 


exp (—|x|). 


(9) 


The double exponential falls off less rapidly than a Gaussian distribution, but faster than a 
Cauchy distribution. For this distribution 68.3% and 95.4% of the values fall within |x| < 1.2 
and |x| < 3.1 respectively. 

The comparison between the sample and assumed distribution yields a p-value (or prob¬ 
ability) that the two are of the same distribution. 


5 It is m ore convent ional t o use u n-binned data for this test, but for completeness we have used both (see 
Sec. 5.3.1 Feigelson fc Babu ( 2012 b. 


6 The inclusion of n reduces the number of degrees of freedom by 1. 
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3. Error Distribution for Full Dataset 


When using weighted mean statistics, the 232 LMC distance moduli yield a central 
estimate of (m — M)q = 18.49 ± 3.11 x 10 -3 mag. We also find y 2 = 3.00 and the number 
of standard deviations that y deviates from unity is N — 15.7. For the median case we find 
a central estimate of (m — M) 0 = 18.49 mag with a la ra nge o f 18.32 ma g < ( m — M) 0 < 
18.59 mag. Our central estimates are in good accord with Ide Griis et al.J (2014) who quote 
(m — M )o = 18.49 ± 0.09 mag@ 


Figure |Tj shows the error distribution of the 232 measurements. These are shown as 
a function of N a , Eq. 0 the number of standard deviations the measured value deviates 
from the central estimate. In Fig. [Qwe show the error distributions for the weighted mean 
and median central estimatesjf] In both cases we also plot the symmetrized distribution as 
a function of |A^.|. For a more detailed perspective of the distribution, see Fig. [2] (with 
\N a \ = 0.1 bin size). 


Figure |Tj shows that for the weighted mean case the distribution has a more extended 
tail than expected for a Gaussian distribution. In fact, for a set of 232 values, a Gaussian 
distribution should yield 11 values with \N a \ > 2, one value with |AG-| > 3, and none with 
|iV CT | >4. However, we find 42 values with \N a \ > 2, 23 with | AG-| >3 and nine with \N a \ >4 
for the weighted mean case. We also note that 68.3% of the observed weighted mean N a error 
distribution falls within -1.37 < N a < 1.26 while 95.4% lies within -3.37 < N a < 4.57. 
The observed weighted mean \N a \ error distribution has limits of |AG-1 < 1-33 and |W| < 
3.63 respectively, and 56.5% and 81.9% of the values fall within \N a \ < 1 and \N a \ < 2 
respectively. These results clearly indicate that the weighted mean error distribution is 
non-Gaussian and so the weighted mean technique is inappropriate for an analysis of these 
data. 


The median case is narrower than Gaussian, with seven values of \N a \ > 2 and none 
with | AG-1 > 3. 68.3% of the data falls within —0.86 < N a < 0.63 while 95.4% lies within 
— 1.97 < N a < 1.27. The \N a \ error distribution has limits of \N a \ < 0.72 and |W| < 
1.66 respectively, and 80.6% and 97.0% of the values fall within \N a \ < 1 and \N a \ < 2 
respectively. The median t echnique is mor e app ropriate because of the non-Gaussianity of 
the distributions, however, de Griis et al. (20_14) note that there are correlations between 
measurements (especially among measurements of the same tracer type). These correlations 


d_e Griis et al. (1201411 use a collection of 233 distance moduli values for their estimate from years 1990 


to 2013, dropping four 2014 measurements. 


8 The larger acE for the median case results in a narrower distribution, see Eq. [5] 
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mean that the measurements are not statistically independent, and the errors associated with 
the median will need to be slightly adjusted to account for this. Regardless, the narrowness 
of the median distribution is clearly consistent with the presence of such correlations. 

Since the distribution for the weighted mean case is broader than Gaussian while the 
median distribution is narrower than a Gaussian, it is of interest to try to fit these observed 
distributions to well-known non-Gaussian distributions. 


To set conventions, we first consider a Gaussian probability distribution function. In 
this case 68.3% of the values have \N a \ < 1. The Gaussianity of the distribution can be 
established by taking a quantitative look at the spread of values. However, the probability 
given by the KS test is < 0.1% for the data set (See Table [[]). Our first non-Gaussian 
distribution, the Cauchy (or Lorentzian) distribution, also has a probability of < 0.1%. 

Next we consider a distribution with extended tails, but less so than the Cauchy dis¬ 
tribution, the Student’s t distribution. Fitting to this function yields a probability of 21% 
(corresponding to a Student’s t distribution with n = 39) for a binned KS test. This may 
appear odd, as we have argued that for the median case the distribution is narrower than 
a Gaussian distribution, while the Student’s t distribution is known for extended tails. To 
explain this we examine the kurtosis of the \N„\ distribution. We use the common definition 
of kurtosis (see Eq. 37.8b of Olive et ah 2014 1 


k -™t 


rrin 


where the fourth and second moments are 


m 4 = — 53 (|i\y - |jqy 


i =1 


and 




( 10 ) 


( 11 ) 


( 12 ) 


i =1 


He re | N a | is the mean of_the n \ N ai | values. For a detailed discussion of kurtosis see 
Balanda fe MacG illi vrav (1198811 . The kurtosis can be defined as a measurement of the 
peakedness, or that of the tail width of a distribution. For example, a large kurtosis would 
represent a distribution with more probability in the peak and tails than in the “shoulder” 
(Balanda & MacGillivrav 1988). A Gaussian distribution has k — 3, and k > 3 represents 
a leptokurtic distribution with a high peak and wide tail J^l For the median case, we find 


9 Often an “excess kurtosis” is used to describe the peakedness of a distribution. This is simply three 
subtracted from the standard kurtosis, and is used to compare to a normal distribution (which would have 
an excess kurtosis of zero). 












number of measurements number of measurements 


k = 5.57. This may explain why this case favors a Student’s t fit, as a Student’s t distri¬ 
bution also has a large kurtosis, i.e. wider tails and a higher peak. The median statistics 
distribution appears to favor this fit because its kurtosis is greater than that for a Gaussian 
distribution, even though it is narrower than a Gaussian. 





Fig. 1.— ffistograms of the error distribution in half standard deviation bins. The top 
(bottom) row uses the weighted mean (median) of the 232 measurements as the central 
estimate. The left column shows the signed deviation, where positive (negative) N a represent 
a value that is greater (less) than the central estimate. The right column shows the absolute 
symmetrized distributions. The smooth curve in each panel is the best-fit Gaussian. 





















Tabic 1. K-S Test Probabilities 


Function 3, 

Data Set 

Un-binned 

Probability(%) b 

Binned 

Probability(%) b 

Gaussian 

Whole (232) 

< 0.1 

< 0.1 


Truncated (223) 

< 0.1 

< 0.1 


Cepheids (81) 

1.5 

< 0.1 


Truncated Cepheids (75) 

2.8 

0.10 


RR Lyrae (63) 

1.5 

< 0.1 


Truncated RR Lyrae (58) 

0.8 

< 0.1 

Cauchy 

Whole (232) 

< 0.1 

< 0.1 


Truncated (223) 

< 0.1 

< 0.1 


Cepheids (81) 

1.0 

< 0.1 


Truncated Cepheids (75) 

2.9 

< 0.1 


RR Lyrae (63) 

1.6 

< 0.1 


Truncated RR Lyrae (58) 

0.7 

< 0.1 

Double Exponential 

Whole (232) 

< 0.1 

< 0.1 


Truncated (223) 

< 0.1 

< 0.1 


Cepheids (81) 

1.5 

< 0.1 


Truncated Cepheids (75) 

3.7 

< 0.1 


RR Lyrae (63) 

1.3 

< 0.1 


Truncated RR Lyrae (58) 

0.6 

< 0.1 

n = 39 Student’s t 

Whole (232) 

< 0.1 

21 

n = 13 Student’s t 

Truncated (223) 

< 0.1 

28 

n = 3 Student’s t 

Cepheids (81) 

0.9 

26 

n = 22 Student’s t 

Truncated Cepheids (75) 

2.7 

25 

n = 59 Student’s t 

RR Lyrae (63) 

1.6 

34 

n = 94 Student’s t 

Truncated RR Lyrae (58) 

0.6 

37 


a For the Student’s t case, the n corresponding with the best probability is displayed. 
b The probability that the data set is compatible with the assumed distribution. 
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Fig. 2.— Histogram of the error distributions in \N a \ = 0.1 bins (with the exception of 
the last, truncated, bin with 5 < |1 < 6 that contains the number of measurements 
to \N a | = 8). The solid black line represents the expected Gaussian probabilities for 232 
measurements and the dotted blue (dashed red) line is the number of | N a | values in each bin 
for the weighted mean (median) case. 


The final non-Gaussian distribution function we consider is the double exponential, or 
Laplace distribution. Again, we do not find a probability of greater than 0.1%. 

To visually clarify the difference between the weighted mean and median statistics |Af CT | 
histograms, we plot them in bins of | N a \ = 0.1, see Fig. [2j We see that the weighted mean case 
is closer to Gaussian near the peak, but has an extended tail. This suggests the existence 
of unaccounted-for systematic errors. For the median case the peak is much higher than 
expected for a Gaussian, and the distribution drops off with increasing | N a | more rapidly 
than expected for a Gaussian. This may be a sign of correlations between measurements or 
possibly publication bias. 

Table [2] is a more compact way of displaying some of this information. For a Gaussian 
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distribution of 232 values, there should be zero measurements with \N a \ > 4 while the 
observed weighted mean case has nine. For illustrative purposes, we truncate this distribution 
by removing all values with | 1 > 40 This leaves us with 223 values and an unchanged 

central estimate of (m — M) 0 = 18.49 ± 3.38 x 1O _3 0 The spread of the values can be 
seen in Fig. [3j We also find that 68.3% of the values fall within —1.55 < N a < 1.05 and 
95.4% fall within —3.66 < N a < 2.06. For the absolute case, |A^o-1 < 1.23 and |A^-l < 3.03 
for 68.3% and 95.4% of the values respectively. In terms of percentages, 61.0% and 85.2% 
of the measurements fall within |7V CT | < 1 and \N a \ < 2 respectively. We note that when 
truncated, the normal standard deviation becomes a = 0.125 while the symmetrized error for 
the median case is a = 0.126. ft would appear that after eliminating | 1 > 4, the median 

and weighted mean cases converge. However, we do utilize a weighted mean rather than the 
standard mean, as the errors for the measurements are not the same, and the weighted mean 
and median statistics error still do not converge even in the truncated case. 




Fig. 3.— Histograms of the error distribution in half standard deviation bins for the trun¬ 
cated weighted mean distribution (N a < 4). The left plot uses the weighted mean of the 223 
measurements as the central estimate to show the signed deviation. The right plot shows 
the symmetrized absolute N a . The smooth curve in each panel is the best-fit Gaussian. 


10 For completeness, we also did a median statistics analysis of this truncated data. As expected, we 
found that removing these nine measurements does not increase probabilities or change the median statistics 
results, which shows the robustness of median statistics. 

11 The 223 values also give a y 2 = 1.90 and N = 8.01 (the number of standard deviations that y deviates 
from unity). 
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Table 2. Expected Gaussian and Observed Numbers of iY CT | 

Tracer 

Values 3, 

1^1 

Expected b 

Observed (WM) C 

Observed (Med) c 

All Types 

232 

> 0.5 

143 

151 

96 



> 1 

74 

101 

45 



> 1.5 

31 

65 

15 



> 2 

11 

42 

7 



> 2.5 

3 

31 

1 



> 3 

1 

23 

0 



> 4 

0 

9 

0 

Cepheids 

81 

> 0.5 

50 

48 

34 



> 1 

26 

35 

17 



> 1.5 

11 

20 

4 



> 2 

4 

10 

2 



> 2.5 

1 

7 

1 



> 3 

0 

6 

0 

RR Lyrae 

63 

> 0.5 

39 

31 

20 



> 1 

20 

20 

11 



> 1.5 

8 

12 

4 



> 2 

3 

7 

1 



> 2.5 

0 

5 

0 



> 3 

0 

3 

0 


a The number of distance moduli measurements used in our analysis. 

b The number of values expected to fall outside of the corresponding \N a \ for 
a Gaussian distribution of total number listed in Col (2). 

c The observed number of values outside of the corresponding \N a \. 
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It is also of interest to determine the probabilities of the four well-known distributions 
for the new truncated weighted mean case. The distributions have a probability of < 0.1% 
for the Gaussian, Cauchy, and double exponential distributions. The truncation to \N a \ < 4 
improves the probability for the Student’s t case, slightly increasing it to 28% compared to 
the 21% for the non-truncated case. Since the probability does not improve for the Gaussian 
fit, this still indicates non-Gaussianity in the measurement distribution, because of the larger 
than expected |A^| > 2 and 3 tail. 


4. Error Distributions for 81 Cepheid Values 


It is of interest to also investigate the spread of individual trace r measurements. We 
first consider the 81 Cepheid distance moduli values tabulated bv lde Grijs et ah (2014). For 
the weighted mean case we find a central estimate of (m — M) 0 = 18.52 ±6.52 x 10~ 3 mag 1^1 
For signed N a , 68.3% of the values fall within —1.04 < N a < 1.73 and 95.4% fall within 
—1.78 < N a < 5.68. For absolute N a , 68.3% and 95.4% of the values fall within \N a \ < 1.31 
and \N a \ < 4.13 respectively, while 56.8% of the values fall within \N a \ < 1 and 87.7% fall 
within \N a \ < 2. 


For the median case we find a central estimate of (m — M) o = 18.50 mag with a lcr 
range of 18.37 mag < (m — M) o < 18.60 mag. For signed N a , 68.3% of the values fall within 
—0.67 < N a < 0.73 and 95.4% fall within —1.28 < N a < 1.76. For absolute N a , 68.3% and 
95.4% of the values fall within | N a \ < 0.71 and \N a \ < 1.63 respectively, while 79.0% of the 
values fall within IW-I < 1 and 97.5% fall within |V CT | < 2. We note that for the median 
case, the error distribution is tighter when we use only the 81 Cepheid values compared to 
the distribution from all 232 measurements. 


The signed and absolute N a distribution for the Cepheid tracers can be seen in Fig. U 
One can see from the top two plots that there is an extended tail in the distribution for the 
weighted mean case, and from the lower two plots, a narrower distribution for the median 
case. We also plot |V CT | in bins of 0.1, see Fig. [5j This figure again illustrates the higher 
than expected peak and rapid drop off of \N a \ for the median case, and the extended tails 
for the weighted mean case. To numerically describe these features, we can again use the 


2 We also find a y 2 = 2.66 and N = 7.96 which is the number of standard deviations that y deviates from 


unity. 






number of measurements number of measurements 
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four well-known distributions. The best probability comes from the Student’s t distribution 
for the median case with a probability of 26% (see Table [T]) . 






Fig. 4.— Histograms of the error distribution in half standard deviation bins for Cephcids. 
The top (bottom) row uses the weighted mean (median) of the 81 measurements as the 
central estimate. The left column shows the signed deviation, where positive (negative) N a 
represent a value that is greater (less) than the central estimate. The right column shows the 
absolute symmetrized distributions. The smooth curve in each panel is the best-fit Gaussian. 
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Fig. 5.— Histogram of the error distributions using Cepheid tracers in |1 = 0.1 bins 
(with the exception of the last bin with 5 < \N a \ < 6). The solid black line represents the 
expected Gaussian probabilities for 81 measurements and the dotted blue (dashed red) line 
is the number of |7V a | values in each bin for the weighted mean (median) case. 


It is of interest to also truncate the Cepheid sub-sample, and we do so by truncating 
all N a > 3 as there should be none for a normally distributed set of 81 measurements 
(see Table |2}. For the weighted mean case, with a new central estimate of (m — M )o = 
18.51 ±7.27 x 10 -3 mag, the distribution slightly tightens. For signed N a , 68.3% of the values 
fall within —0.931 < N a < 1.48 and 95.4% fall within —2.43 < N a < 2.38. For absolute N a , 
68.3% and 95.4% of the values fall within \N a \ < 1.11 and \N a \ < 2.23 respectively, while 
65.3% of the values fall within \N a \ < 1 and 94.7% fall within \N a \ < 2. As for the median 
case, the distribution does not significantly change (as expected with median statistics). 
Table |T] shows the probabilities for the new truncated cepheid set. The probability for the 
un-binned KS test only slightly increases to 2.7% while the binned probability does not 
significantly change. 








































16 


5. Error Distribution for 63 RR Lyrae Values 

also tabulate 63 RR Lyrae distance moduli @ whose error distri¬ 
bution we study here. We find a weighted mean central estimate of (m — M) o = 18.48 ± 
1.03 x 10~ 2 mag. For signed N a , 68.3% of the values fall within —0.83 < N a < 1.15 and 
95.4% fall within —1.75 < N a < 3.11. For absolute N a , 68.3% and 95.4% of the values 
fall within \N a \ < 1.00 and |1 < 3.11 respectively, while 68.3% of the values fall within 
\N a \ < 1 and 88.9% fall within |jVo.| < 2. 

For the median case we find a central estimate of (m — M) 0 = 18.47 mag with a la 
range of 18.29 mag < [m — M) 0 < 18.55 mag. For signed N ai 68.3% of the values fall within 
—0.65 < N a < 0.48 and 95.4% fall within —1.50 < N a < 1.03. For absolute N a , 68.3% and 
95.4% of the values fall within \N a \ < 0.50 and | A^-1 < 1.56 respectively, while 82.5% of the 
values fall within || < 1 and 98.4% fall within |7V CT | < 2. 

We plot \N ff \ in bins of 0.1, see Fig. [HI and the spread of values can be seen in Fig. [3 In 
this case, the non-Gaussianity is not as visually striking. We also fit the RR Lyrae measure¬ 
ments to the four distributions. The Student’s t distribution gives the largest probability of 
34% (see Table [1]). 

We also truncated the RR Lyrae sub-sumple by only including values with N a < 2.5, 
as there should be none greater than this for a set of 63 normally distributed measurements 
(See Table [2j). In doing so the weighted mean error distribution was slightly tightened. 
I 14 l We find a slightly changed central estimate of (m — M)q = 18.49 ± 1.10 x 10 -2 mag. 
For signed N a , 68.3% of the values fall within —0.68 < N a < 0.79 and 95.4% fall within 
—2.49 < N a < 1.81. For absolute N a , 68.3% and 95.4% of the values fall within |iV CT | < 0.75 
and |1W| < 1.87 respectively, while 75.9% of the values fall within |AGr| < 1 and 98.3% fall 
within \N a \ < 2. When fitting the sub-sample error distribution to well-known distributions, 
we find a slight increase in probabilities given by the KS test (See Table [1]). We find that 
the highest probability of 37% is given by an n = 94 Student’s t distribution, which slightly 
increased from 34%. 


de Grijs et al. (2014) 


13 Three RR Lyrae values in Ide Griis et all (2014) quote a zero error and were not used here. 


14 The median case did not significantly change. 
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Fig. 6.— Histogram of the error distributions using RR Lyrae tracers in |1 =0.1 bins. 
The solid black line represents the expected Gaussian probabilities for 63 measurements and 
the dotted blue (dashed red) line is the number of \N a \ values in each bin for the weighted 
mean (median) case. 
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Fig. 7.— Histograms of the error distribution in half standard deviation bins for RR Lyrae. 
The top (bottom) row uses the weighted mean (median) of the 63 measurements as the 
central estimate. The left column shows the signed deviation, where positive (negative) N a 
represent a value that is greater (less) than the central estimate. The right column shows the 
absolute symmetrized distributions. The smooth curve in each panel is the best-fit Gaussian. 
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6. SMC Distance Moduli 


We have also analyzed 247 SMC distance moduli measurement J 33 ! compiled by de Griis &; Bono 
( 2015 ) and find similar results to those given by LMC distance Moduli measurements!*®! For 
the weighted mean case, which gives a central estimate of (m — M) 0 = 18.93 ± 2.38 x ICC 2 
mag, we find extended tails in the error distribution. For signed N a , we find that 68.3% 
and 95.4% of the measurements fall within —2.01 < N a < 1.91 and —6.59 < N a < 4.76 
respectively. For the unsigned N a we find that 68.3% and 95.4% of the measurements are 
within |77 ct | < 1.91 and \N a \ < 5.26 respectively. Conversely, 45.8% of the measurements fall 
within \N a \ < 1 and 70.5% fall within \N a \ < 2. These wider tails suggest unaccounted-for 
systematic errors. 

As for the median case, which gives a central estimate of (m — M) o = 18.94 mag with a 
la range of 18.81 mag < (m — M) o < 19.08 mag, the distribution is narrower than expected 
for a Gaussian. For signed N a , we find that 68.3% and 95.4% of the measurements fall within 
—0.80 < N a < 0.78 and —1.60 < N a < 2.91 respectively. For the unsigned N a we find that 
68.3% and 95.4% of the measurements are within \N a \ < 0.79 and \N a \ < 1.68 respectively. 
Conversely, 78.5% of the measurements fall within \N a \ < 1 and 96.8% fall within |AAr| < 

2. This narrow distribution indicates the presence of correlations between measurements 
(especially within similar tracer types), as suggested by[de Griis & Bono (12015 7. 


We also examine the distributions given by the two tracer types with a greater number of 
measurements: Cephcids (101 measurements) and RR Lyrae (30). For the Cepheid weighted 
mean case, we find a central estimate of (m — M) 0 = 18.98 ± 4.17 x 10 -3 mag. 68.3% and 
95.4% of the measurements are within —1.55 < N a < 0.98 and —6.36 < N a < 1.76 for 
signed N a . For the absolute case \N a \ < 1.26 and |< 4.02 for 68.3% and 95.4% of 
the measurements respectively. Alternatively, 56.5% and 86.1% of the measurements fall 
within N a < 1 and N a < 2 respectively. For the median case, we find a central estimate 
of (m — M) o = 18.98 mag with a la range of 18.83 mag < (m — M) 0 < 19.13 mag. The 
distribution shows that 68.3% and 95.4% of the measurements are within —0.83 < N a < 0.68 
and —1.91 < N a < 1.46 for signed N a . For the absolute case \No.\ < 0.81 and \N a \ < 1.54 for 
68.3% and 95.4% of the measurements respectively. Alternatively, 82.2% and 98.0% of the 
measurements fall within N a < 1 and N a < 2 respectively. Again, we see a wider (narrower) 
than Gaussian distribution for the weighted mean (median) case. 


de Griis & Bono (120151) collected 304 estimates, but we have only included measurements with non-zero 


error. 


16 We thank Jacob Peyton for helping with this analysis. 
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For the sub-sample of RR Lyrae tracer types, we notice similar distributions. For the 
weighted mean case, we find a central estimate of (■ m—M) 0 = 18.86±5.20 x ICR 1 * 3 mag. 68.3% 
and 95.4% of the measurements are within —2.40 < N a < 0.88 and —2.40 < N a < 1.47 for 
signed N a ^j\ For the absolute case \N a \ < 1.49 and \N a \ < 3.26 for 68.3% and 95.4% of 
the measurements respectively. Alternatively, 50.0% and 80.0% of the measurements fall 
within N a < 1 and N a < 2 respectively. For the median case, we find a central estimate 
of (m — M) o = 18.90 mag with a la range of 18.74 mag < (m — M) 0 < 19.06 mag. The 
distribution shows that 68.3% and 95.4% of the measurements are within —0.51 < N a < 0.86 
and —1.13 < N a < 1.24 for signed N a . For the absolute case \N a \ < 0.65 and |A^-l < 1.28 
for 68.3% and 95.4% of the measurements respectively. Alternatively, 83.3% and 100% of 
the measurements fall within N a < 1 and N a < 2 respectively. 

We also attempt to fit the error distributions to four well-known distributions. The 
probabilities, found by using the KS test, are given in Table [3] We find that all distributions 
are fit best by a Student’s t distribution. The whole (247) distribution is best fit by an n = 1 
Student’s t with a probability of 74%. 


7. Conclusion 


We have studied the error distributions of LMC distance moduli compiled by de Grijs et al. 


(2014). We find that the error distributions are non-Gaussian with extended tails when using 
a weighted mean central estimate, probably as a consequence of unaccounted-for systematic 
errors. In fact, only 53 of the 237 values tabulated by de Griis et al. (2 0141) have a non-zero 
systematic error. Because the weighted mean error distributions are non-Gaussian, it is more 
appropriate to use the median statistics error distribution. 

The median statistics e rror distributions are narrower than Gaussian, supporting the 
conclusion of de Griis et al. (2014), who argue that this is a consequence of correlations 
between some of the measurements, with publication bias possibly also contributing mildly. 


We thank R. de Grijs, J. Wicker, and G. Bono for providing us with the data. In 
addition, we thank R. de Grijs, J. Wicker, G. Horton-Smith, and A. Ivanov for valuable 
comments and advice. Finally, we thank Jacob Peyton for the SMC distance moduli analysis. 


1 ' The two lower bounds are the same due to the distribution being weighted towards the positive N a side 

(there are more values with N a > 0). Symmetrizing this distribution gives a clearer understanding of the 


error. 











Table 3. K-S Test Probabilities 


Function 3, 

Un-binned 

Data Set Probability(%) b 

Binned 

Probability (%) b 

Gaussian 

Whole (247) 

< 0.1 

< 0.1 


Cepheids (101) 

< 0.1 

< 0.1 


RR Lyrae (30) 

8.4 

20 

Cauchy 

Whole (247) 

< 0.1 

< 0.1 


Cepheids (101) 

< 0.1 

< 0.1 


RR Lyrae (30) 

33 

15 

Double Exponential 

Whole (247) 

< 0.1 

< 0.1 


Cepheids (101) 

< 0.1 

< 0.1 


RR Lyrae (30) 

36 

20 

n = 1 Student’s t 

Whole (247) 

59 

< 0.1 

n = 1 Student’s t 

Cepheids (101) 

74 

< 0.1 

n = 11 Student’s t 

RR Lyrae (30) 

22 

31 


a For the Student’s t case, the n corresponding with the best probability 
is displayed. 

b The probability that the data set is compatible with the assumed dis¬ 
tribution. 







This work was supported in part by DOE grant DEFG 03-99EP41093 and NSF grant AST- 
1109275. 
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